Skip to content

Conversation

@seun-ja
Copy link
Contributor

@seun-ja seun-ja commented Aug 20, 2025

This PR aims to resolve #3211. Remote LLM implementation now provides options: the default option and OpenAI.

The default represents the current option, and as the name states, it's the default in case none is provided.

It's expandable while preserving the original API and not breaking it.

Concerns arise regarding wasi::llm parameters that are not compatible with OpenAI's API documentation, which may necessitate adjustments to the current definitions available in wit.. Documented here

[Update]

In reference to @rylev's comment there would be no need for a change in the wit file, just going to use what's available.

@itowlson
Copy link
Collaborator

It would be useful to see an example of how a user selects this back-end in their runtime config. It might also be useful to add some discussion to the PR of the tradeoffs you considered when bundling this in with the existing cloud-gpu remote back-end vs. having it as a separate (typed) back end.

@seun-ja seun-ja marked this pull request as ready for review August 25, 2025 20:41
@seun-ja
Copy link
Contributor Author

seun-ja commented Aug 26, 2025

The PR is going with bundling the open-ai api backend into the existing llm-remote-http crate because:

  • They both share lots of common resource, therefore, avoid unneccessary cyclic import issue
  • Interegrating it with existing won't break the current set-up
  • Keeps the current streamlined options, Spin (local) and RemoteHttp, while still making the later customizable and expandable depending on the optional field added to the runtime-config

@itowlson, I just realised in my PR main comment I said "breaking it", apologies, I meant "not breaking it" 😄

pub struct RemoteHttpCompute {
url: Url,
auth_token: String,
custom_llm: Option<String>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want this to be an enum which would allow us to constrain it to only the options implemented.


tracing::info!("Sending remote inference request to {chat_url}");

let body = CreateChatCompletionRequest {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use the generate API instead of Chat completions here because our interface does not really lend itself to having multiple roles. The wit interface aligns more closely with the generate API instead of chat completions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have looked into this option; the endpoint is /v1/responses. While it is more compatible with the current wit interface, I noticed an unusual behaviour with this endpoint while testing it.

For example, it didn't work with gpt-4. I got this error

"error": {
    "message": "The model `gpt-4` does not exist or you do not have access to it.",
    "type": "invalid_request_error",
    "param": null,
    "code": "model_not_found"
 }

But when I tried using it with gpt-4o, it worked. Got a different error message and rightly so 🫣

"error": {
    "message": "You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.",
    "type": "insufficient_quota",
    "param": null,
    "code": "insufficient_quota"
  }

I tried with Ollama using the gpt-oss:20b model, got this error:

404 page not found

Dived deep and realised Ollama doesn't have that endpoint. However, Ollama has the /generate endpoint, but not compatible with what we need here. Returns this:

{
    "model": "gpt-oss:20b",
    "created_at": "2025-08-28T21:17:25.432385Z",
    "response": "",
    "done": true,
    "done_reason": "load"
}

In conclusion, I think the current setup is fine, the User role matches the expected behaviour, similar to what the default remote LLM does.

url: Url,
auth_token: String,
#[serde(default)]
custom_llm: CustomLlm,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably calling it backend might make more sense than custom_llm as this refers to the backing API.


#[derive(Debug, Default, serde::Deserialize, PartialEq)]
#[serde(rename_all = "snake_case")]
pub enum CustomLlm {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably needs a better name. Something similar to the backend or something along those lines.

Copy link
Contributor

@dicej dicej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I don't have much LLM API experience, so I can't comment on the details of how this uses the OpenAI APIs, but the structure of the code seems reasonable to me.

Is there any way we can automate testing this without necessarily spending money on an OpenAI account for CI, and without writing a mock API server from scratch (which might not be very useful anyway)?

@seun-ja
Copy link
Contributor Author

seun-ja commented Sep 9, 2025

Thanks for the review @dicej.

There could be a way. Would look into that.

On the other hand, because the API is compatible with other similar LLM APIs, such as Ollama, and it's free.

@seun-ja seun-ja force-pushed the implement-open-ai-api-llm-backend branch from e61e83a to ccc7b34 Compare September 29, 2025 21:06
struct CreateChatCompletionResponse {
/// A unique identifier for the chat completion.
#[serde(rename = "id")]
_id: String,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why we're deserialising these presumably unused fields? I see you'd need them with deny_unknown_fields but that doesn't seem to be turned on...?

(This seems to be a general pattern throughout. If we need it then no worries, just looks odd to me.)

}

#[derive(Deserialize)]
struct OpenAIEmbeddingUsage {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the name, this seems specific to the OpenAI backend, which makes me wonder if CreateEmbeddingResponse is also OpenAI-specific. I'd expect OpenAI stuff to be in open_ai.rs but I'm not sure I'm understanding this right.

user: None,
};

let chat_url = self
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chat_url seems a misleading name for this.


let chat_url = self
.url
.join("/v1/chat/completions")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This string appears in multiple places: should we pull it out to a const?


let chat_url = self
.url
.join("/v1/embeddings")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, consider creating a const

#[derive(serde::Deserialize)]
#[serde(untagged)]
enum CreateChatCompletionResponses {
Success(CreateChatCompletionResponse),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reinforces my concern about the location of CreateChatCompletionResponse


#[derive(serde::Deserialize)]
#[serde(untagged)]
enum CreateChatCompletionResponses {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confused by this name. It appears to contain only one response?


#[derive(serde::Deserialize)]
#[serde(untagged)]
enum CreateEmbeddingResponses {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again puzzled by the plural

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is it the schema for? Is it OpenAI-specific? All these message types seem a bit randomly scattered across the files: could we clarify the rhyme and reason, e.g. so maintainers know where to put new types?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is OpenAI-specific schemas (I'm probably using the word incorrectly).

I thought having it inside of open_ai.rs might just make the file a bit crowded.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I'd suggest:

  • Move the file into a new open_ai folder
  • Add mod schemas; to the top of open_ai.rs
  • Change references in open_ai.rs from crate::schemas to just schemas

This will make the schemas module a sub-module of the open_ai module which is a nice pattern for private stuff which would be too clutterful in the main file of the module.

@seun-ja seun-ja force-pushed the implement-open-ai-api-llm-backend branch from ccc7b34 to 78faf6b Compare September 29, 2025 22:19
@seun-ja seun-ja requested a review from itowlson September 29, 2025 22:23
@@ -0,0 +1,266 @@
#![allow(dead_code)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the dead code? Why is it needed? If it is neede3d, could we limit this declaration to specific parts of the module rather than the whole thing (so that we don't lose checking on the whole module)?

}

#[derive(Serialize, Debug)]
struct CreateChatCompletionRequest {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still unclear on when I should put message types in this file vs when I should put them in schemas.rs.

#[derive(Deserialize)]
struct CreateChatCompletionResponse {
/// A unique identifier for the chat completion.
id: String,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused. Is this why you allowed dead code? If we can't just omit the unused fields, then your previous strategy was right; but I would have thought we could omit them e.g.

struct CCCR {
  choices: ...,
  usage: ...,
}

But if that doesn't work then the _id prefix kludge is better than the dead code kludge.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deserialization would fail if these fields are not provided, which is why I can't ignore them.

I'd revert to the underscore prefix

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why deserialisation would fail. By default, if there's no Rust field that matches a JSON field, the JSON field is ignored. You need to explicitly opt into the "fail on unknown JSON" behaviour, because you never know when the server will add a new field; and as far as I can tell you haven't done that. (You had me doubting myself but I tested it: https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=14e024bd7b123e568dd368bc31dcbe95)

I feel like I'm missing something here - could you clarify the nature of the failure? Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're absolutely right, apologies.

I was confusing it with some issues I encountered earlier in this PR.

@seun-ja seun-ja force-pushed the implement-open-ai-api-llm-backend branch from 78faf6b to 6ce4fec Compare September 29, 2025 23:31
@seun-ja seun-ja requested a review from itowlson September 29, 2025 23:33

let url = self
.url
.join("EMBEDDINGS_ENDPOINT")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.join("EMBEDDINGS_ENDPOINT")
.join(EMBEDDINGS_ENDPOINT)

This raises some concerns about testing for me: I can't see how this could have worked.

I appreciate automated testing is hard for a service like this but can you confirm that you've tested the current code manually (both OpenAI and the changes to classic) and maybe capture what tests you performed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching that.

For testing, I manually test using the example provided in this PR [./examples/open-ai-rust]. It only tests for inference, which I still used to test before the last commit.

For embedded, I also used it but modified it slightly to ensure it worked. Used Ollama locally (their API is modelled closely to OpenAI's).

Copy link
Collaborator

@itowlson itowlson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM apart from that typo: once that's fixed and you've fully re-run your tests, this should be good to go.

@seun-ja seun-ja force-pushed the implement-open-ai-api-llm-backend branch from 6ce4fec to b8f9973 Compare September 29, 2025 23:57
@seun-ja seun-ja requested a review from itowlson September 30, 2025 00:19
impl From<CreateChatCompletionResponse> for wasi_llm::InferencingResult {
fn from(value: CreateChatCompletionResponse) -> Self {
Self {
text: value.choices[0].message.content.clone(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to defend against choices being empty? Or is it guaranteed to have an element?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it isn't explicitly stated in the documents that there will be at least one, it is expected to always have at least one if it's successful.

I suppose it costs nothing to add a safeguard against such an event.

While preserving the default http client, it introduces an OpenAI client type which also supports APIs similar to OpenAI's specs. Also includes an example

Signed-off-by: Aminu Oluwaseun Joshua <[email protected]>
@seun-ja seun-ja force-pushed the implement-open-ai-api-llm-backend branch from b8f9973 to 4226ea7 Compare September 30, 2025 00:59
@seun-ja seun-ja requested a review from itowlson September 30, 2025 01:00
@itowlson itowlson merged commit 4394e4b into spinframework:main Sep 30, 2025
17 checks passed
@seun-ja seun-ja deleted the implement-open-ai-api-llm-backend branch September 30, 2025 05:14
@karthik2804
Copy link
Contributor

I am late here, but to answer

Is there any way we can automate testing this without necessarily spending money on an OpenAI account for CI, and without writing a mock API server from scratch (which might not be very useful anyway)?

We could technically run ollama with a small model to verify in CI integration tests but whether it is worth it, I am not sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement OpenAI API LLM backend

5 participants